Standardize internal version checks in cuda.core#1825
Standardize internal version checks in cuda.core#1825Andy-Jost merged 10 commits intoNVIDIA:mainfrom
Conversation
Move the GraphBuilder/Graph/GraphCompleteOptions/GraphDebugPrintOptions implementation out of _graph/__init__.py into _graph/_graph_builder.pyx so it is compiled by Cython. A thin __init__.py re-exports the public names so all existing import sites continue to work unchanged. Cython compatibility adjustments: - Remove `from __future__ import annotations` (unsupported by Cython) - Remove TYPE_CHECKING guard; quote annotations that reference Stream (circular import), forward-reference GraphBuilder/Graph, or use X | None union syntax - Update _graphdef.pyx lazy imports to point directly at _graph_builder No build_hooks.py changes needed — the build system auto-discovers .pyx files via glob. Ref: NVIDIA#1076 Made-with: Cursor
Replace the per-module _lazy_init / _inited / _driver_ver / _py_major_minor pattern in _graph_builder.pyx with direct calls to centralized cached functions in cuda_utils: - Add get_driver_version() with @functools.cache alongside get_binding_version - Switch get_binding_version from @functools.lru_cache to @functools.cache (cleaner for nullary functions) - Fix split() to return tuple(result) — Cython enforces return type annotations unlike pure Python - Fix _cond_with_params annotation from -> GraphBuilder to -> tuple to match actual return value Made-with: Cursor
Implements NVIDIA#1328: host callbacks during stream capture via cuLaunchHostFunc, mirroring the existing GraphDef.callback API. Extracts shared callback infrastructure (_attach_user_object, _attach_host_callback_to_graph, trampoline/destructor) into a new _graph/_utils.pyx module to avoid circular imports between _graph_builder and _graphdef. Made-with: Cursor
Move binding and driver version queries into a dedicated
cuda/core/_utils/version.{pyx,pxd} module, providing both Python
(binding_version, driver_version) and Cython (cy_binding_version,
cy_driver_version) entry points. All functions return version tuples
((major, minor, patch)) and are cached—Python via @functools.cache,
Cython via module-level globals.
Remove get_binding_version / get_driver_version from cuda_utils.pyx
and update all internal call sites and tests to import from the new
module. Remove version checks for CUDA < 12.0 (now the minimum) and
eliminate dead code exposed by the migration: _lazy_init / _use_ex /
_kernel_ctypes / _is_cukernel_get_library_supported machinery in
_module.pyx, _launcher.pyx, and _launch_config.pyx.
The public NVML-based system.get_driver_version API is unrelated and
left unchanged.
Made-with: Cursor
Made-with: Cursor # Conflicts: # cuda_core/cuda/core/_graph/_graph_builder.pyx # cuda_core/cuda/core/_utils/cuda_utils.pyx
This comment has been minimized.
This comment has been minimized.
Remove unused imports flagged by cython-lint and ruff after resolving merge conflicts with origin/main. Made-with: Cursor
645ccb9 to
51b8f63
Compare
Remove the now-dead _reduce_3_tuple helper from cuda_utils.pyx. Made-with: Cursor
Initialize _use_nvjitlink_backend to None so it can serve as its own "already decided" sentinel, eliminating the redundant _driver_ver variable and the driver_version() call that was only used to set it. Made-with: Cursor
Add -> tuple[int, int, int] annotations to binding_version and driver_version. Align driver_version arithmetic with _system.pyx. Made-with: Cursor
rwgk
left a comment
There was a problem hiding this comment.
I think it'd be better if "driver version" is a two-tuple. See the long comment.
The cuda-bindings version is a three-tuple, but the driver version is not.
| try: | ||
| parts = importlib.metadata.version("cuda-bindings").split(".")[:3] | ||
| except importlib.metadata.PackageNotFoundError: | ||
| parts = importlib.metadata.version("cuda-python").split(".")[:3] |
There was a problem hiding this comment.
I'd put a simple sanity check:
assert len(parts) == 3
- Cheap
- Should never fire
- But if it does, it can prevent a lot of confusion
(I'd deliberately not make this any more sophisticated.)
There was a problem hiding this comment.
This is covered by the trailing [:3] at lines 15, 17.
| def driver_version() -> tuple[int, int, int]: | ||
| """Return the CUDA driver version as a (major, minor, patch) triple.""" | ||
| cdef int ver = handle_return(driver.cuDriverGetVersion()) | ||
| return (ver // 1000, (ver // 10) % 100, ver % 10) |
There was a problem hiding this comment.
According to the below (ChatGPT Pro, instant), this is "technically correct but semantically misleading." I'd go with Option B.
User Prompt
I'm reviewing this cython code (NVIDIA/cuda-python repo):
@functools.cache
def driver_version() -> tuple[int, int, int]:
"""Return the CUDA driver version as a (major, minor, patch) triple."""
cdef int ver = handle_return(driver.cuDriverGetVersion())
return (ver // 1000, (ver // 10) % 100, ver % 10)
On the CTK side I see this (the last digit is always zero, irrespective of the patch version):
$ grep '#define CUDA_VERSION' $cudah
./12.0.1_525.85.12/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12000
./12.1.1_530.30.02/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12010
./12.2.2_535.104.05/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12020
./12.3.2_545.23.08/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12030
./12.4.1_550.54.15/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12040
./12.5.1_555.42.06/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12050
./12.6.3_560.35.05/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12060
./12.8.1_570.124.06/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12080
./12.9.1_575.57.08/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12090
./13.0.2_580.95.05/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 13000
./13.1.1_590.48.01/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 13010
./13.2.0_595.45.04/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 13020
But what about the driver? Does the last digit there have a significance? Is it ever different from 0?
Assistant Response
Short answer: your instinct is right to question it — the “patch” digit in cuDriverGetVersion() is not meaningful in practice and is effectively always 0.
1. What cuDriverGetVersion() actually returns
From the CUDA driver API, cuDriverGetVersion() returns an integer encoding:
version = 1000 * major + 10 * minor
That’s it. There is no patch component encoded.
Examples:
- CUDA 12.6 →
12060 - CUDA 13.1 →
13010
If decoded as:
(ver // 1000, (ver // 10) % 100, ver % 10)
you get (12, 6, 0) — but that last 0 is just an artifact.
2. Why your CTK observation matches
CUDA_VERSION uses the same encoding:
#define CUDA_VERSION 12060
So:
- Toolkit patch versions (e.g. 12.6.3) are not represented
- The last digit is always
0
3. Driver vs Toolkit versioning
CUDA Toolkit
- Has real patch versions (e.g. 12.6.3)
CUDA_VERSIONdoes not encode them
CUDA Driver
- Version tied to NVIDIA driver (e.g. 560.35.05)
cuDriverGetVersion()reports only CUDA API compatibility level
4. Is the last digit ever non-zero?
No — it is structurally always zero.
5. Practical implication
Your code is technically correct but semantically misleading:
return (ver // 1000, (ver // 10) % 100, ver % 10)
It suggests a meaningful patch version, which does not exist.
6. Recommendation
Option A (preferred):
def driver_version() -> tuple[int, int]:
return (ver // 1000, (ver // 10) % 100)
Option B (if API stability matters):
Keep the triple but document that patch is always zero.
"""Return (major, minor, patch=0). Patch is always 0 for driver."""
Bottom line
- Patch digit is always zero
- No semantic meaning
- Treat as (major, minor)
Bonus
To get the actual driver version (e.g. 560.35.05), use NVML:
nvmlSystemGetDriverVersion()
There was a problem hiding this comment.
I was going to ask whether to make bindings a pair or triple until I found the three-part format in the existing code here.
The danger with mixing these is that comparisons between two- and three-part tuples will give inconsistent and surprising results (see below). Rather than trying to remember which version uses which format, it's better to follow a uniform rule: make every version a three-tuple and always compare against a three-tuple. That's a simple rule everyone can remember, and it makes inconsistencies easy to spot.
>>> (12, 3) <= (12, 3, 0)
True
>>> (12, 3) >= (12, 3, 0)
False
| version = binding_version() | ||
| if version < (12, 9, 0): | ||
| raise RuntimeError( | ||
| f"NVVM bindings require cuda-bindings >= 12.9.0, but found {version[0]}.{version[1]}.x. " |
There was a problem hiding this comment.
Here is the old code using a three-part version for cuda-bindings.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
|
Summary
Centralizes all internal CUDA version checking into a new
cuda/core/_utils/version.{pyx,pxd}module, replacing ad-hoc patterns scattered across multiple files. Both Python (binding_version,driver_version) and Cython (cy_binding_version,cy_driver_version) accessors are provided, each returning cached(major, minor, patch)tuples.Changes
cuda/core/_utils/version.{pyx,pxd}with:binding_version()/cy_binding_version(): returns(major, minor, patch)fromcuda-bindingsmetadata, cached via@functools.cache(Python) and module-level globals (Cython)driver_version()/cy_driver_version(): returns(major, minor, patch)converted fromcuDriverGetVersion, same caching strategy(12, 3, 0)instead of12030or(12, 3))_module.pyx,_launcher.pyx, and_launch_config.pyx:_lazy_init/_inited/_lock/_use_exlazy initialization machinery_kernel_ctypes/_get_kernel_ctypes/_is_cukernel_get_library_supportedunused helpers'.'.join(map(str, version_tuple))get_binding_versionandget_driver_versionfromcuda_utils.pyxsystem.get_driver_versionAPI, which serves a different purpose (user-facing system info vs. internal feature gating)Test Coverage
test_device.py,test_module.py,test_program.py,test_cuda_utils.py,test_optional_dependency_imports.py, andgraph/test_explicit.pyto use new version functions_program.pyxand_linker.pyxuse the Pythondefversions (binding_version,driver_version) so thattest_optional_dependency_imports.pycan continue to monkeypatch them for controlled testingMade with Cursor